The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
香草无监督的域适应方法倾向于用固定的神经体系结构优化模型,在现实世界中,这不是很实际的,因为目标数据通常由不同的资源有限的设备处理。因此,促进各种设备的建筑适应非常必要。在本文中,我们引入了一个简单的框架,可靠的域名适应,以通过重量分担模型库来改善跨域的概括,从中可以采样不同能力的模型,以适应不同的准确性效率折衷。该框架中的主要挑战在于同时提高模型库中众多模型的适应性。为了解决这个问题,我们开发了一种随机的集合蒸馏方法,以完全利用模型库中的互补知识进行模型间相互作用。然而,考虑到模型间相互作用与模型适应之间的优化冲突,我们将现有的BI-CLAPERIFIER域混淆体系结构扩大到优化分离的三级分类器对应物中。优化模型库后,通过我们提出的无监督性能评估指标利用体系结构的适应。在各种资源限制下,我们的框架超过了其他竞争方法,在多个基准测试方面的利润很大。还值得强调的是,即使计算复杂性降低到$ 1/64 $,我们的框架也可以保护仅源模型的性能提高。代码将在https://github.com/hikvision-research/slimda上找到。
translated by 谷歌翻译
半监督的对象检测在平均教师驱动的自我训练的发展中取得了重大进展。尽管结果有令人鼓舞,但在先前的工作中尚未完全探索标签不匹配问题,从而导致自训练期间严重确认偏见。在本文中,我们从两个不同但互补的角度(即分布级别和实例级别)提出了一个简单而有效的标签框架。对于前者,根据Monte Carlo采样,可以合理地近似来自标记数据的未标记数据的类分布。在这种弱监督提示的指导下,我们引入了一个重新分配卑鄙的老师,该老师利用自适应标签 - 分布意识到的信心阈值来生成无偏见的伪标签来推动学生学习。对于后一个,存在着跨教师模型的被忽视的标签分配歧义问题。为了解决这个问题,我们提出了一种新的标签分配机制,用于自我训练框架,即提案自我分配,该机制将学生的建议注入教师,并生成准确的伪标签,以相应地匹配学生模型中的每个建议。 MS-Coco和Pascal-VOC数据集的实验证明了我们提出的框架与其他最先进的框架相当优越。代码将在https://github.com/hikvision-research/ssod上找到。
translated by 谷歌翻译
无监督域自适应对象检测的自我训练是一项艰巨的任务,其性能在很大程度上取决于伪盒的质量。尽管结果有令人鼓舞,但先前的工作在很大程度上忽略了自训练期间伪箱的不确定性。在本文中,我们提出了一个简单而有效的框架,称为概率教师(PT),该框架旨在从逐渐发展的教师中捕获未标记的目标数据的不确定性,并以互惠互利的方式指导学生学习学生。具体而言,我们建议利用不确定性引导的一致性训练来促进分类适应和本地化适应,而不是通过精心设计的置信度阈值过滤伪盒。此外,我们与定位适应同时进行锚定适应性,因为锚被视为可学习的参数。与此框架一起,我们还提出了一种新颖的熵局灶性损失(EFL),以进一步促进不确定性引导的自我训练。配备了EFL,PT的表现优于所有以前的基线,并实现了新的最先进。
translated by 谷歌翻译
受视力语言预训练模型的显着零击概括能力的启发,我们试图利用剪辑模型的监督来减轻数据标记的负担。然而,这种监督不可避免地包含标签噪声,从而大大降低了分类模型的判别能力。在这项工作中,我们提出了Transductive Clip,这是一个新型的框架,用于学习具有从头开始的嘈杂标签的分类网络。首先,提出了一种类似的对比学习机制来减轻对伪标签的依赖并提高对嘈杂标签的耐受性。其次,合奏标签被用作伪标签更新策略,以稳定具有嘈杂标签的深神经网络的培训。该框架可以通过组合两种技术有效地从夹子模型中降低嘈杂标签的影响。多个基准数据集的实验证明了比其他最新方法的实质性改进。
translated by 谷歌翻译
With the development of machine learning and data science, data sharing is very common between companies and research institutes to avoid data scarcity. However, sharing original datasets that contain private information can cause privacy leakage. A reliable solution is to utilize private synthetic datasets which preserve statistical information from original datasets. In this paper, we propose MC-GEN, a privacy-preserving synthetic data generation method under differential privacy guarantee for machine learning classification tasks. MC-GEN applies multi-level clustering and differential private generative model to improve the utility of synthetic data. In the experimental evaluation, we evaluated the effects of parameters and the effectiveness of MC-GEN. The results showed that MC-GEN can achieve significant effectiveness under certain privacy guarantees on multiple classification tasks. Moreover, we compare MC-GEN with three existing methods. The results showed that MC-GEN outperforms other methods in terms of utility.
translated by 谷歌翻译
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.
translated by 谷歌翻译
In this tutorial paper, we look into the evolution and prospect of network architecture and propose a novel conceptual architecture for the 6th generation (6G) networks. The proposed architecture has two key elements, i.e., holistic network virtualization and pervasive artificial intelligence (AI). The holistic network virtualization consists of network slicing and digital twin, from the aspects of service provision and service demand, respectively, to incorporate service-centric and user-centric networking. The pervasive network intelligence integrates AI into future networks from the perspectives of networking for AI and AI for networking, respectively. Building on holistic network virtualization and pervasive network intelligence, the proposed architecture can facilitate three types of interplay, i.e., the interplay between digital twin and network slicing paradigms, between model-driven and data-driven methods for network management, and between virtualization and AI, to maximize the flexibility, scalability, adaptivity, and intelligence for 6G networks. We also identify challenges and open issues related to the proposed architecture. By providing our vision, we aim to inspire further discussions and developments on the potential architecture of 6G.
translated by 谷歌翻译
We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.
translated by 谷歌翻译
Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs such as the position of a document. A successful factorization will allow the relevance tower to be exempt from biases. In this work, we identify a critical issue that existing ULTR methods ignored - the bias tower can be confounded with the relevance tower via the underlying true relevance. In particular, the positions were determined by the logging policy, i.e., the previous production model, which would possess relevance information. We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation. We then propose three methods to mitigate the negative confounding effects by better disentangling relevance and bias. Empirical results on both controlled public datasets and a large-scale industry dataset show the effectiveness of the proposed approaches.
translated by 谷歌翻译